{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dfsDR_omdNea"
      },
      "source": [
        "# Gemma - Minimal RAG\n",
        "\n",
        "This cookbook demonstrates how you can build a minimal Retrieval-Augmented Generation (RAG) system without using any orchestration tool like LangChain or LlamaIndex, or any vector database. The only dependency needed is Google's [UniSim](https://github.com/google/unisim) project as the embedding model and [HtmlChunker](https://github.com/google/labs-prototypes/tree/main/seeds/chunker-python).\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_1]Minimal_RAG.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MwMiP7jDdAL1"
      },
      "source": [
        "## Setup\n",
        "\n",
        "### Select the Colab runtime\n",
        "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n",
        "\n",
        "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
        "2. Select **Change runtime type**.\n",
        "3. Under **Hardware accelerator**, select **T4 GPU**.\n",
        "\n",
        "\n",
        "### Gemma setup on Hugging Face\n",
        "This cookbook uses Gemma 7B instruction tuned model through Hugging Face. So you will need to:\n",
        "\n",
        "* Get access to Gemma on [huggingface.co](huggingface.co) by accepting the Gemma license on the Hugging Face page of the specific model, i.e., [Gemma 7B IT](https://huggingface.co/google/gemma-7b-it).\n",
        "* Generate a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens) and configure it as a Colab secret 'HF_TOKEN'."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SGDV_qlIOu2F"
      },
      "source": [
        "## Retrieval-Augmented Generation (RAG)\n",
        "\n",
        "Large Language Models (LLMs) can learn new abilities without directly being trained on them. However, LLMs have been known to \"hallucinate\" when tasked with providing responses for questions they have not been trained on. This is partly because LLMs are unaware of events after training. It is also very difficult to trace the sources from which LLMs draw their responses from. For reliable, scalable applications, it is important that an LLM provides responses that are grounded in facts and is able to cite its information sources.\n",
        "\n",
        "A common approach used to overcome these constraints is called Retrieval Augmented Generation (RAG), which augments the prompt sent to an LLM with relevant data retrieved from an external knowledge base through an Information Retrieval (IR) mechanism. The knowledge base can be your own corpora of documents, databases, or APIs.\n",
        "\n",
        "### Chunking the data\n",
        "\n",
        "To improve the relevance of content returned by the vector database during retrieval, break down large documents into smaller pieces or chunks while ingesting the document.\n",
        "\n",
        "In this cookbook, you will use the [Google I/O 2024 Gemma family expansion launch blog](https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/) as the sample document and Google's [Open Source HtmlChunker](https://github.com/google/labs-prototypes/tree/main/seeds/chunker-python) to chunk it up into passages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pdWNUmA_zcnV"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: google-labs-html-chunker in /usr/local/lib/python3.10/dist-packages (0.0.5)\n",
            "Requirement already satisfied: beautifulsoup4>=4.12.2 in /usr/local/lib/python3.10/dist-packages (from google-labs-html-chunker) (4.12.3)\n",
            "Requirement already satisfied: html5lib>=1.1 in /usr/local/lib/python3.10/dist-packages (from google-labs-html-chunker) (1.1)\n",
            "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.12.2->google-labs-html-chunker) (2.5)\n",
            "Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib>=1.1->google-labs-html-chunker) (1.16.0)\n",
            "Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib>=1.1->google-labs-html-chunker) (0.5.1)\n"
          ]
        }
      ],
      "source": [
        "!pip install google-labs-html-chunker\n",
        "\n",
        "from google_labs_html_chunker.html_chunker import HtmlChunker\n",
        "\n",
        "from urllib.request import urlopen\n",
        "\n",
        "with urlopen(\n",
        "    \"https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/\"\n",
        ") as f:\n",
        "    html = f.read().decode(\"utf-8\")\n",
        "\n",
        "# Chunk the file using HtmlChunker\n",
        "chunker = HtmlChunker(\n",
        "    max_words_per_aggregate_passage=200,\n",
        "    greedily_aggregate_sibling_nodes=True,\n",
        "    html_tags_to_exclude={\"noscript\", \"script\", \"style\"},\n",
        ")\n",
        "passages = chunker.chunk(html)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "myiMmXpFQDSH"
      },
      "source": [
        "Take a look at how the chunked text look like."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0N63HefX-pYa"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Introducing PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit\n",
            "            \n",
            "            \n",
            "            \n",
            "            - Google Developers Blog\n",
            "Products Develop Android Chrome ChromeOS Cloud Firebase Flutter Google Assistant Google Maps Platform Google Workspace TensorFlow YouTube Grow Firebase Google Ads Google Analytics Google Play Search Web Push and Notification APIs Earn AdMob Google Ads API Google Pay Google Play Billing Interactive Media Ads Solutions Events Learn Community Groups Google Developer Groups Google Developer Student Clubs Woman Techmakers Google Developer Experts Tech Equity Collective Programs Accelerator Solution Challenge DevFest Stories All Stories Developer Profile Blog Search English English Español (Latam) Bahasa Indonesia 日本語 한국어 Português (Brasil) 简体中文\n",
            "Products More Solutions Events Learn Community More Developer Profile Blog Develop Android Chrome ChromeOS Cloud Firebase Flutter Google Assistant Google Maps Platform Google Workspace TensorFlow YouTube Grow Firebase Google Ads Google Analytics Google Play Search Web Push and Notification APIs Earn AdMob Google Ads API Google Pay Google Play Billing Interactive Media Ads Groups Google Developer Groups Google Developer Student Clubs Woman Techmakers Google Developer Experts Tech Equity Collective Programs Accelerator Solution Challenge DevFest Stories All Stories English Español (Latam) Bahasa Indonesia 日本語 한국어 Português (Brasil) 简体中文\n",
            "Gemini Introducing PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit MAY 14, 2024 Tris Warkentin Director, Product Management Xiaohua Zhai Senior Staff Research Scientist Ludovic Peran Product Manager Share Facebook Twitter LinkedIn Mail\n",
            "At Google, we believe in the power of collaboration and open research to drive innovation, and we're grateful to see Gemma embraced by the community with millions of downloads within a few short months of its launch. This enthusiastic response has been incredibly inspiring, as developers have created a diverse range of projects like Navarasa , a multilingual variant for Indic languages, to Octopus v2 , an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions. This spirit of exploration and creativity has also fueled our development of CodeGemma , with its powerful code completion and generation capabilities, and RecurrentGemma , offering efficient inference and research possibilities.\n",
            "Link to Youtube Video (visible only when JS is disabled)\n",
            "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Today, we're excited to further expand the Gemma family with the introduction of PaliGemma , a powerful open vision-language model (VLM), and a sneak peek into the near future with the announcement of Gemma 2. Additionally, we're furthering our commitment to responsible AI with updates to our Responsible Generative AI Toolkit, providing developers with new and enhanced tools for evaluating model safety and filtering harmful content.\n",
            "Introducing PaliGemma: Open Vision-Language Model PaliGemma is a powerful open VLM inspired by PaLI-3 . Built on open components including the SigLIP vision model and the Gemma language model, PaliGemma is designed for class-leading fine-tune performance on a wide range of vision-language tasks. This includes image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation. We're providing both pretrained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration. To facilitate open exploration and research, PaliGemma is available through various platforms and resources. Start exploring today with free options like Kaggle and Colab notebooks. Academic researchers seeking to push the boundaries of vision-language research can also apply for Google Cloud credits to support their work. Get started with PaliGemma today. You can find PaliGemma on GitHub, Hugging Face models , Kaggle, Vertex AI Model Garden , and ai.nvidia.com (accelerated with TensoRT-LLM) with easy integration through JAX and Hugging Face Transformers. (Keras integration coming soon) You can also interact with the model via this Hugging Face Space .\n",
            "Screenshot from the HuggingFace Space running PaliGemma\n",
            "Announcing Gemma 2: Next-Gen Performance and Efficiency We're thrilled to announce the upcoming arrival of Gemma 2, the next generation of Gemma models. Gemma 2 will be available in new sizes for a broad range of AI developer use cases and features a brand new architecture designed for breakthrough performance and efficiency, offering benefits such as: Class Leading Performance: At 27 billion parameters, Gemma 2 delivers performance comparable to Llama 3 70B at less than half the size. This breakthrough efficiency sets a new standard in the open model landscape. Reduced Deployment Costs : Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI, making deployment more accessible and cost-effective for a wider range of users.\n",
            "Versatile Tuning Toolchains: Gemma 2 will provide developers with robust tuning capabilities across a diverse ecosystem of platforms and tools. From cloud-based solutions like Google Cloud to popular community tools like Axolotl , fine-tuning Gemma 2 will be easier than ever. Plus, seamless partner integration with Hugging Face and NVIDIA TensorRT-LLM, along with our own JAX and Keras, ensures you can optimize performance and efficiently deploy across various hardware configurations.\n",
            "Gemma 2 is still pretraining. This chart shows performance from the latest Gemma 2 checkpoint along with benchmark pretraining metrics. Source: Hugging Face Open LLM Leaderboard (April 22, 2024) and Grok announcement blog\n",
            "Stay tuned for the official launch of Gemma 2 in the coming weeks! Expanding the Responsible Generative AI Toolkit For this reason we're expanding our Responsible Generative AI Toolkit to help developers conduct more robust model evaluations by releasing the LLM Comparator in open source. The LLM Comparator is a new interactive and visual tool to perform effective side-by-side evaluations to assess the quality and safety of model' responses. To see the LLM Comparator in action, explore our demo showcasing a comparison between Gemma 1.1 and Gemma 1.0.\n",
            "We hope that this tool will advance further the toolkit’s mission to help developers create AI applications that are not only innovative but also safe and responsible. As we continue to expand the Gemma family of open models, we remain dedicated to fostering a collaborative environment where cutting-edge AI technology and responsible development go hand in hand. We're excited to see what you build with these new tools and how, together, we can shape the future of AI.\n",
            "posted in: Gemini AI Announcements Explore Learn Previous Next Related Posts Android Firebase Announcements Documentation Google I/O 2024: What’s new in Android Development Tools May 16, 2024 Google Wallet Mobile Web Announcements Best Practices Everything you need to know about Google Wallet May 16, 2024 Smart Home Mobile Web Announcements Best Practices Home APIs: Enabling all developers to build for the home May 15, 2024 Firebase AI Documentation Problem-Solving How Firebase Genkit helped add AI to our Compass app May 15, 2024 Gemini AI Announcements Community Build for Tomorrow in the Gemini API Developer Competition May 14, 2024\n",
            "Connect Blog Instagram LinkedIn Twitter YouTube Programs Women Techmakers Google Developer Groups Google Developer Experts Accelerators Google Developer Student Clubs Developer consoles Google API Console Google Cloud Platform Console Google Play Console Firebase Console Actions on Google Console Cast SDK Developer Console Chrome Web Store Dashboard\n",
            "Android Chrome Firebase Google Cloud Platform All products Manage cookies Terms Privacy English English Español (Latam) Bahasa Indonesia 日本語 한국어 Português (Brasil) 简体中文\n"
          ]
        }
      ],
      "source": [
        "for passage in passages:\n",
        "    print(passage)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iLBZLeZpQL5t"
      },
      "source": [
        "## Retrieve the relevant chunks"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "19yZsnG1RoTJ"
      },
      "source": [
        "Given a user question 'where can I get PaliGemma?', you will use Unisim to retrieve the relevant chunks.\n",
        "\n",
        "First, compute the similarities between the user question and all the text chunks (passages)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QY9gfGPH2a6I"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: unisim in /usr/local/lib/python3.10/dist-packages (0.0.1)\n",
            "Requirement already satisfied: tabulate in /usr/local/lib/python3.10/dist-packages (from unisim) (0.9.0)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from unisim) (1.25.2)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from unisim) (4.66.4)\n",
            "Requirement already satisfied: onnx in /usr/local/lib/python3.10/dist-packages (from unisim) (1.16.1)\n",
            "Requirement already satisfied: jaxtyping in /usr/local/lib/python3.10/dist-packages (from unisim) (0.2.28)\n",
            "Requirement already satisfied: onnxruntime in /usr/local/lib/python3.10/dist-packages (from unisim) (1.18.0)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from unisim) (2.0.3)\n",
            "Requirement already satisfied: usearch>=2.6.0 in /usr/local/lib/python3.10/dist-packages (from unisim) (2.12.0)\n",
            "Requirement already satisfied: typeguard==2.13.3 in /usr/local/lib/python3.10/dist-packages (from jaxtyping->unisim) (2.13.3)\n",
            "Requirement already satisfied: protobuf>=3.20.2 in /usr/local/lib/python3.10/dist-packages (from onnx->unisim) (3.20.3)\n",
            "Requirement already satisfied: coloredlogs in /usr/local/lib/python3.10/dist-packages (from onnxruntime->unisim) (15.0.1)\n",
            "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.10/dist-packages (from onnxruntime->unisim) (24.3.25)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from onnxruntime->unisim) (24.0)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from onnxruntime->unisim) (1.12)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->unisim) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->unisim) (2023.4)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->unisim) (2024.1)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->unisim) (1.16.0)\n",
            "Requirement already satisfied: humanfriendly>=9.1 in /usr/local/lib/python3.10/dist-packages (from coloredlogs->onnxruntime->unisim) (10.0)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->onnxruntime->unisim) (1.3.0)\n",
            "INFO: Loaded backend\n",
            "INFO: Using TF with GPU\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer RandomNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.\n",
            "  warnings.warn(\n",
            "WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "INFO: UniSim is storing a copy of the indexed data\n",
            "INFO: If you are using large data corpus, consider disabling this behavior using store_data=False\n"
          ]
        }
      ],
      "source": [
        "!pip install unisim\n",
        "from unisim import TextSim\n",
        "\n",
        "user_question = \"where can I find PaliGemma?\"\n",
        "\n",
        "text_sim = TextSim()\n",
        "\n",
        "similarities = []\n",
        "for passage in passages:\n",
        "    similarities.append(text_sim.similarity(user_question, passage))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5jSGuCBemtjB"
      },
      "source": [
        "Put the passages and similarities into a dataframe."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BDIiPl3KbNgI"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "summary": "{\n  \"name\": \"results_df\",\n  \"rows\": 17,\n  \"fields\": [\n    {\n      \"column\": \"passage\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 17,\n        \"samples\": [\n          \"Introducing PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit\\n            \\n            \\n            \\n            - Google Developers Blog\",\n          \"Products Develop Android Chrome ChromeOS Cloud Firebase Flutter Google Assistant Google Maps Platform Google Workspace TensorFlow YouTube Grow Firebase Google Ads Google Analytics Google Play Search Web Push and Notification APIs Earn AdMob Google Ads API Google Pay Google Play Billing Interactive Media Ads Solutions Events Learn Community Groups Google Developer Groups Google Developer Student Clubs Woman Techmakers Google Developer Experts Tech Equity Collective Programs Accelerator Solution Challenge DevFest Stories All Stories Developer Profile Blog Search English English Espa\\u00f1ol (Latam) Bahasa Indonesia \\u65e5\\u672c\\u8a9e \\ud55c\\uad6d\\uc5b4 Portugu\\u00eas (Brasil) \\u7b80\\u4f53\\u4e2d\\u6587\",\n          \"Link to Youtube Video (visible only when JS is disabled)\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"similarity\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.09025372632820139,\n        \"min\": 0.2962530851364136,\n        \"max\": 0.5732300877571106,\n        \"num_unique_values\": 17,\n        \"samples\": [\n          0.5173189043998718,\n          0.29951414465904236,\n          0.3335300087928772\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
              "type": "dataframe",
              "variable_name": "results_df"
            },
            "text/html": [
              "\n",
              "  <div id=\"df-970b3860-288e-4111-a17e-e9ba94e39377\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>passage</th>\n",
              "      <th>similarity</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Introducing PaliGemma, Gemma 2, and an Upgrade...</td>\n",
              "      <td>0.517319</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Products Develop Android Chrome ChromeOS Cloud...</td>\n",
              "      <td>0.299514</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Products More Solutions Events Learn Community...</td>\n",
              "      <td>0.296253</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Gemini Introducing PaliGemma, Gemma 2, and an ...</td>\n",
              "      <td>0.508258</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>At Google, we believe in the power of collabor...</td>\n",
              "      <td>0.369846</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Link to Youtube Video (visible only when JS is...</td>\n",
              "      <td>0.333530</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Gemma is a family of lightweight, state-of-the...</td>\n",
              "      <td>0.386614</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Introducing PaliGemma: Open Vision-Language Mo...</td>\n",
              "      <td>0.573230</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>Screenshot from the HuggingFace Space running ...</td>\n",
              "      <td>0.530472</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Announcing Gemma 2: Next-Gen Performance and E...</td>\n",
              "      <td>0.460508</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>Versatile Tuning Toolchains: Gemma 2 will prov...</td>\n",
              "      <td>0.387215</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Gemma 2 is still pretraining. This chart shows...</td>\n",
              "      <td>0.350080</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>Stay tuned for the official launch of Gemma 2 ...</td>\n",
              "      <td>0.304729</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>We hope that this tool will advance further th...</td>\n",
              "      <td>0.313545</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>posted in: Gemini AI Announcements Explore Lea...</td>\n",
              "      <td>0.350449</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>Connect Blog Instagram LinkedIn Twitter YouTub...</td>\n",
              "      <td>0.318327</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>Android Chrome Firebase Google Cloud Platform ...</td>\n",
              "      <td>0.384414</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-970b3860-288e-4111-a17e-e9ba94e39377')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-970b3860-288e-4111-a17e-e9ba94e39377 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-970b3860-288e-4111-a17e-e9ba94e39377');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-c7f38880-c3b7-4d28-bcf5-15f6d4ec8e39\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c7f38880-c3b7-4d28-bcf5-15f6d4ec8e39')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-c7f38880-c3b7-4d28-bcf5-15f6d4ec8e39 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "  <div id=\"id_c11ef716-9813-4703-82b7-4a73e20a2b50\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('results_df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_c11ef716-9813-4703-82b7-4a73e20a2b50 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('results_df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "                                              passage  similarity\n",
              "0   Introducing PaliGemma, Gemma 2, and an Upgrade...    0.517319\n",
              "1   Products Develop Android Chrome ChromeOS Cloud...    0.299514\n",
              "2   Products More Solutions Events Learn Community...    0.296253\n",
              "3   Gemini Introducing PaliGemma, Gemma 2, and an ...    0.508258\n",
              "4   At Google, we believe in the power of collabor...    0.369846\n",
              "5   Link to Youtube Video (visible only when JS is...    0.333530\n",
              "6   Gemma is a family of lightweight, state-of-the...    0.386614\n",
              "7   Introducing PaliGemma: Open Vision-Language Mo...    0.573230\n",
              "8   Screenshot from the HuggingFace Space running ...    0.530472\n",
              "9   Announcing Gemma 2: Next-Gen Performance and E...    0.460508\n",
              "10  Versatile Tuning Toolchains: Gemma 2 will prov...    0.387215\n",
              "11  Gemma 2 is still pretraining. This chart shows...    0.350080\n",
              "12  Stay tuned for the official launch of Gemma 2 ...    0.304729\n",
              "13  We hope that this tool will advance further th...    0.313545\n",
              "14  posted in: Gemini AI Announcements Explore Lea...    0.350449\n",
              "15  Connect Blog Instagram LinkedIn Twitter YouTub...    0.318327\n",
              "16  Android Chrome Firebase Google Cloud Platform ...    0.384414"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import pandas as pd\n",
        "\n",
        "results_df = pd.DataFrame({\"passage\": passages, \"similarity\": similarities})\n",
        "results_df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "J5Oe5A0cm16y"
      },
      "source": [
        "Identify the top 3 most relevant passages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HW40lOI9cK6d"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "7    Introducing PaliGemma: Open Vision-Language Mo...\n",
              "8    Screenshot from the HuggingFace Space running ...\n",
              "0    Introducing PaliGemma, Gemma 2, and an Upgrade...\n",
              "Name: passage, dtype: object"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "top_3_similarities = results_df.nlargest(3, \"similarity\")\n",
        "top_3_targets = top_3_similarities[\"passage\"]\n",
        "top_3_targets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CqXD_0NyUEtG"
      },
      "source": [
        "Next, assemble a prompt using both the user question and retrieved context."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ODgapzEc8ir-"
      },
      "outputs": [],
      "source": [
        "prompt_template = \"\"\"You are an expert in answering user questions. You always understand user questions well, and then provide high-quality answers based on the information provided in the context.\n",
        "\n",
        "If the provided context does not contain relevant information, just respond \"I could not find the answer based on the context you provided.\"\n",
        "\n",
        "User question: {}\n",
        "\n",
        "Context:\n",
        "{}\n",
        "\"\"\"\n",
        "\n",
        "context = \"\\n\".join(\n",
        "    [f\"{i+1}. {passage}\" for i, passage in enumerate(top_3_targets.iloc[:].tolist())]\n",
        ")\n",
        "prompt = f\"{prompt_template.format(user_question, context)}\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8spJD5iwUt0X"
      },
      "source": [
        "Here is the final prompt that will be sent to Gemma."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lgh36K4TB5Qf"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "You are an expert in answering user questions. You always understand user questions well, and then provide high-quality answers based on the information provided in the context.\n",
            "\n",
            "If the provided context does not contain relevant information, just respond \"I could not find the answer based on the context you provided.\"\n",
            "\n",
            "User question: where can I find PaliGemma?\n",
            "\n",
            "Context:\n",
            "1. Introducing PaliGemma: Open Vision-Language Model PaliGemma is a powerful open VLM inspired by PaLI-3 . Built on open components including the SigLIP vision model and the Gemma language model, PaliGemma is designed for class-leading fine-tune performance on a wide range of vision-language tasks. This includes image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation. We're providing both pretrained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration. To facilitate open exploration and research, PaliGemma is available through various platforms and resources. Start exploring today with free options like Kaggle and Colab notebooks. Academic researchers seeking to push the boundaries of vision-language research can also apply for Google Cloud credits to support their work. Get started with PaliGemma today. You can find PaliGemma on GitHub, Hugging Face models , Kaggle, Vertex AI Model Garden , and ai.nvidia.com (accelerated with TensoRT-LLM) with easy integration through JAX and Hugging Face Transformers. (Keras integration coming soon) You can also interact with the model via this Hugging Face Space .\n",
            "2. Screenshot from the HuggingFace Space running PaliGemma\n",
            "3. Introducing PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit\n",
            "            \n",
            "            \n",
            "            \n",
            "            - Google Developers Blog\n",
            "\n"
          ]
        }
      ],
      "source": [
        "print(prompt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-p5mlGqllss1"
      },
      "source": [
        "### Generate the answer"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uXLpmtoeU0gx"
      },
      "source": [
        "Now load the Gemma model in quantized 4-bit mode using Hugging Face."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LlQ0qLieCWCc"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: bitsandbytes in /usr/local/lib/python3.10/dist-packages (0.43.1)\n",
            "Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (0.30.1)\n",
            "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from bitsandbytes) (2.3.0+cu121)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from bitsandbytes) (1.25.2)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (24.0)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate) (6.0.1)\n",
            "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.23.1)\n",
            "Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.4.3)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (3.14.0)\n",
            "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (4.11.0)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (1.12)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (3.1.4)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (2023.6.0)\n",
            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (8.9.2.26)\n",
            "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.3.1)\n",
            "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (11.0.2.54)\n",
            "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (10.3.2.106)\n",
            "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (11.4.5.107)\n",
            "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.0.106)\n",
            "Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (2.20.5)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (12.1.105)\n",
            "Requirement already satisfied: triton==2.3.0 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes) (2.3.0)\n",
            "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->bitsandbytes) (12.5.40)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->accelerate) (2.31.0)\n",
            "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->accelerate) (4.66.4)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->bitsandbytes) (2.1.5)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->accelerate) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->accelerate) (3.7)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->accelerate) (2.0.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->accelerate) (2024.2.2)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->bitsandbytes) (1.3.0)\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "`low_cpu_mem_usage` was None, now set to True since model is quantized.\n",
            "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
            "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
            "`config.hidden_activation` if you want to override this behaviour.\n",
            "See https://github.com/huggingface/transformers/pull/29402 for more details.\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "4f40160582374ee880aad5ee79df0da4",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "!pip install bitsandbytes accelerate\n",
        "from transformers import AutoTokenizer\n",
        "import transformers\n",
        "import torch\n",
        "import bitsandbytes, accelerate\n",
        "\n",
        "model = \"google/gemma-7b-it\"\n",
        "\n",
        "tokenizer = AutoTokenizer.from_pretrained(model)\n",
        "pipeline = transformers.pipeline(\n",
        "    \"text-generation\",\n",
        "    model=model,\n",
        "    model_kwargs={\n",
        "        \"torch_dtype\": torch.float16,\n",
        "        \"quantization_config\": {\"load_in_4bit\": True},\n",
        "    },\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gsmvKH-UU8yx"
      },
      "source": [
        "Finally, generate the answer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dxPUcDSHDHpw"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py:426: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Sure, here is the answer to the user question:\n",
            "\n",
            "You can find PaliGemma on GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden, and ai.nvidia.com (accelerated with TensoRT-LLM) with easy integration through JAX and Hugging Face Transformers.\n"
          ]
        }
      ],
      "source": [
        "messages = [\n",
        "    {\"role\": \"user\", \"content\": prompt},\n",
        "]\n",
        "prompt = pipeline.tokenizer.apply_chat_template(\n",
        "    messages, tokenize=False, add_generation_prompt=True\n",
        ")\n",
        "outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.1)\n",
        "print(outputs[0][\"generated_text\"][len(prompt) :])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3s9cjDsMWP9g"
      },
      "source": [
        "Gemma is able to provide the correct answer based on the retrieved context.\n",
        "\n",
        "In this cookbook the sample document [Google I/O 2024 Gemma family expansion launch blog](https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/) is pretty short, so after chunking there aren't many passages to search through. To make the cookbook minimal, we did exhaustive search to find the relevant search.\n",
        "\n",
        "In real world use cases, there may be a lot of chunks to go through for a single query, in which case you will need to use Approximate Nearest Neighbor (ANN) for efficiency. This is usually directly supported by vector databases. UniSim also supports ANN, please consult UniSim documentation and its [Colab](https://github.com/google/unisim/blob/main/notebooks/unisim_text_demo.ipynb) on indexing and searching.\n",
        "\n",
        "UniSim team has also created a separate [RAG demo](https://github.com/google/unisim/blob/main/notebooks/unisim-gemma-text_rag_demo.ipynb). Feel free to check it out."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "[Gemma_1]Minimal_RAG.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
